A Statistical Approach for Similarity Measurement Between Sentences for EBMT

نویسنده

  • Niladri Chatterjee
چکیده

Success of Example-Based Machine Translation depends heavily on how efficient the retrieval scheme is. The more similar is the retrieved sentence to the input one, the easier will be the adaptation of the retrieved translation to the current requirement. However, there is no suitable scheme for measuring similarity between sentences. This paper reports preliminary results of a similarity measurement scheme that is based on a linear model , whose coefficients are determined by multiple regression technique. The data for the analysis has been collected from a survey of a number of respondents. Three major aspects of similarity, namely pragmatic, syntactic and semantic have been considered. Each respondent has been asked to evaluate the similarity between different pairs of sentences that are carefully designed to reflect one of the above types of similarity. A statistical analysis of these evaluations reveals general human perception about sentential similarity, which will help in designing a suitable retrieval scheme.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Searching Similar (Sub)Sentences for Example-Based Machine Translation

Translation is a repetitive activity. The attempt to automate such a difficult task has been a long-term scientific dream; in the past years research in this field has acquired a growing interest, making some forms of Machine Translation (MT) a reality. Among the several types of approaches in MT, one of the most promising paradigms is MAHT and, in particular, example-Based Machine Translation ...

متن کامل

Identifying Synonymous Expressions From A Bilingual Corpus For Example-Based Machine Translation

Example-based machine translation (EBMT) is based on a bilingual corpus. In EBMT, sentences similar to an input sentence are retrieved from a bilingual corpus and then output is generated from translations of similar sentences. Therefore, a similarity measure between the input sentence and each sentence in the bilingual corpus is important for EBMT. If some similar sentences are missed from ret...

متن کامل

Word Selection for EBMT based on Monolingual Similarity and Translation Confidence

We propose a method of constructing an example-based machine translation (EBMT) system that exploits a content-aligned bilingual corpus. First, the sentences and phrases in the corpus are aligned across the two languages, and the pairs with high translation confidence are selected and stored in the translation memory. Then, for a given input sentences, the system searches for fitting examples b...

متن کامل

Using Example-Based MT to Support Statistical MT when Translating Homogeneous Data in a Resource-Poor Setting

In this paper, we address the issue of applying example-based machine translation (EBMT) methods to overcome some of the difficulties encountered with statistical machine translation (SMT) techniques. We adopt two different EBMT approaches and present an approach to augment output quality by strategically combining both EBMT approaches with the SMT system to handle issues arising from the use o...

متن کامل

A novel method for detecting structural damage based on data-driven and similarity-based techniques under environmental and operational changes

The applications of time series modeling and statistical similarity methods to structural health monitoring (SHM) provide promising and capable approaches to structural damage detection. The main aim of this article is to propose an efficient univariate similarity method named as Kullback similarity (KS) for identifying the location of damage and estimating the level of damage severity. An impr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006